Model Selection

Multimodal Input

# Multimodal Input

Gemma 3 12b Pt Qat Q4 0 Gguf

Gemma 3 is a lightweight open-source multimodal model from Google, supporting text and image input with text output, featuring a 128K ultra-long context window and support for 140+ languages.

3DTopia-XL is a diffusion Transformer architecture based on PrimX efficient 3D representation, capable of rapidly generating high-quality 3D assets

Sam2 Hiera Base Plus

SAM 2 is a foundational model for promptable visual segmentation in images and videos developed by FAIR, supporting efficient segmentation through prompts.

Image Segmentation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase